Goto

Collaborating Authors

 computational geometry


Persistence-based topological optimization: a survey

Carriere, Mathieu, Ike, Yuichi, Lacombe, Théo, Nishikawa, Naoki

arXiv.org Machine Learning

Computational topology provides a tool, persistent homology, to extract quantitative descriptors from structured objects (images, graphs, point clouds, etc). These descriptors can then be involved in optimization problems, typically as a way to incorporate topological priors or to regularize machine learning models. This is usually achieved by minimizing adequate, topologically-informed losses based on these descriptors, which, in turn, naturally raises theoretical and practical questions about the possibility of optimizing such loss functions using gradient-based algorithms. This has been an active research field in the topological data analysis community over the last decade, and various techniques have been developed to enable optimization of persistence-based loss functions with gradient descent schemes. This survey presents the current state of this field, covering its theoretical foundations, the algorithmic aspects, and showcasing practical uses in several applications. It includes a detailed introduction to persistence theory and, as such, aims at being accessible to mathematicians and data scientists newcomers to the field. It is accompanied by an open-source library which implements the different approaches covered in this survey, providing a convenient playground for researchers to get familiar with the field.


Hardness of High-Dimensional Linear Classification

Munteanu, Alexander, Omlor, Simon, Phillips, Jeff M.

arXiv.org Machine Learning

We establish new exponential in dimension lower bounds for the Maximum Halfspace Discrepancy problem, which models linear classification. Both are fundamental problems in computational geometry and machine learning in their exact and approximate forms. However, only $O(n^d)$ and respectively $\tilde O(1/\varepsilon^d)$ upper bounds are known and complemented by polynomial lower bounds that do not support the exponential in dimension dependence. We close this gap up to polylogarithmic terms by reduction from widely-believed hardness conjectures for Affine Degeneracy testing and $k$-Sum problems. Our reductions yield matching lower bounds of $\tildeΩ(n^d)$ and respectively $\tildeΩ(1/\varepsilon^d)$ based on Affine Degeneracy testing, and $\tildeΩ(n^{d/2})$ and respectively $\tildeΩ(1/\varepsilon^{d/2})$ conditioned on $k$-Sum. The first bound also holds unconditionally if the computational model is restricted to make sidedness queries, which corresponds to a widely spread setting implemented and optimized in many contemporary algorithms and computing paradigms.



Tree! Iamno Tree! Iama Low Dimensional Hyperbolic Embedding

Neural Information Processing Systems

Note havethatd(z, w)=( y, z)w ifandonlyd(z, w)=( x, z)w. InProceedingsof the Twenty-sixth Annual ACMSymposiumon Principlesof Distributed Computing, PODC '07, pages 43-52, New York, NY, USA, 2007.





Multi-Covering a Point Set by $m$ Disks with Minimum Total Area

Guitouni, Mariem, Loi, Chek-Manh, Fekete, Sándor P., Perk, Michael, Becker, Aaron T.

arXiv.org Artificial Intelligence

Abstract-- A common robotics sensing problem is to place sensors to robustly monitor a set of assets, where robustness is assured by requiring asset p to be monitored by at least κ (p) sensors. Given n assets that must be observed by m sensors, each with a disk-shaped sensing region, where should the sensors be placed to minimize the total area observed? W e provide and analyze a fast heuristic for this problem. Subsequently, we enforce separation constraints between the sensors by modifying the integer program formulation and by changing the disk candidate set. I. Introduction Coordinating different kinds of robotic assets is a natural challenge when it comes to problems of surveillance and guidance. As shown in Figure 1, this gives rise to scenarios in which a finite set of drones with downward communication links must maintain control of a finite set of ground assets [7], [15], choosing a set of different altitudes that balance safe separation between drones with reliable communication to the ground. The latter requires sufficient signal strength, so communication areas (and thus energy consumption) depend quadratically on the altitude.


Enhancing binary classification: A new stacking method via leveraging computational geometry

Wu, Wei, Tang, Liang, Zhao, Zhongjie, Teo, Chung-Piaw

arXiv.org Artificial Intelligence

Binary classification is a fundamental task in machine learning and data science, with applications spanning numerous domains, including spam detection, medical diagnostics, image recognition, credit scoring. The goal is to predict a binary outcome--typically labeled as 0 or 1--based on a set of input features. Various machine learning algorithms, such as logistic regression (LR), k-nearest neighbors (kNN), support vector machines (SVM), and neural network (NN), are commonly employed for binary classification tasks. These algorithms can be mainly divided into two categories: those with interpretability, which are convenient for analysis and control (e.g., LR); and those without interpretability but with potentially good classification performance (e.g., NN). Ensemble learning, a powerful technique in predictive modeling, has gained widespread recognition for its ability to improve model performance by combining the strengths of multiple learning algorithms [1]. Among ensemble methods, stacking stands out by integrating the predictions of diverse base models (different learning algorithms) through a meta-model, resulting in enhanced prediction accuracy compared to only using the best base model [2]. Stacking has demonstrated significant applications in classification problems such as network intrusion detection [3, 4], cancer type classification [5], credit lending [6], and protein-protein binding affinity prediction [7].


Online Epsilon Net and Piercing Set for Geometric Concepts

Bhore, Sujoy, Dey, Devdan, Singh, Satyam

arXiv.org Artificial Intelligence

VC-dimension and $\varepsilon$-nets are key concepts in Statistical Learning Theory. Intuitively, VC-dimension is a measure of the size of a class of sets. The famous $\varepsilon$-net theorem, a fundamental result in Discrete Geometry, asserts that if the VC-dimension of a set system is bounded, then a small sample exists that intersects all sufficiently large sets. In online learning scenarios where data arrives sequentially, the VC-dimension helps to bound the complexity of the set system, and $\varepsilon$-nets ensure the selection of a small representative set. This sampling framework is crucial in various domains, including spatial data analysis, motion planning in dynamic environments, optimization of sensor networks, and feature extraction in computer vision, among others. Motivated by these applications, we study the online $\varepsilon$-net problem for geometric concepts with bounded VC-dimension. While the offline version of this problem has been extensively studied, surprisingly, there are no known theoretical results for the online version to date. We present the first deterministic online algorithm with an optimal competitive ratio for intervals in $\mathbb{R}$. Next, we give a randomized online algorithm with a near-optimal competitive ratio for axis-aligned boxes in $\mathbb{R}^d$, for $d\le 3$. Furthermore, we introduce a novel technique to analyze similar-sized objects of constant description complexity in $\mathbb{R}^d$, which may be of independent interest. Next, we focus on the continuous version of this problem, where ranges of the set system are geometric concepts in $\mathbb{R}^d$ arriving in an online manner, but the universe is the entire space, and the objective is to choose a small sample that intersects all the ranges.